Basic Premise

Making plots is a very repetitive: draw this line, add these colored points, then add these, etc. Instead of re-using the same code over and over, ggplot implements them using a high-level but very expressive API. The result is less time spent creating your charts, and more time interpreting what they mean.

ggplot is not a good fit for people trying to make highly customized data visualizations. While you can make some very intricate, great looking plots, ggplot sacrifices highly customization in favor of general doing "what you'd expect".

Data

ggplot has a symbiotic relationship with pandas. If you're planning on using ggplot, it's best to keep your data in DataFrames. Think of a DataFrame as a tabular data object. For example, let's look at the diamonds dataset which ships with ggplot.


In [1]:
%matplotlib inline
from ggplot import *
diamonds.head()


Out[1]:
carat cut color clarity depth table price x y z
0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
3 0.29 Premium I VS2 62.4 58.0 334 4.20 4.23 2.63
4 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75

Aesthetics

Aesthetics describe how your data will relate to your plots. Some common aesthetics are: x, y, and color. Aesthetics are specific to the type of plot (or layer) you're adding to your visual. For example, a scatterplot (geom_point) and a line (geom_line) will share x and y, but only a line chart has a linetype aesthetic.


In [2]:
aes(x='carat', y='price')
aes(x='price', fill='clarity')
aes(x='date', y='beef')


Out[2]:
{'y': 'beef', 'x': 'date'}

Layers

ggplot lets you combine or add different types of visualization components (or layers) together. I think this is easiest to understand with an example.

Start with a blank canvas.


In [3]:
p = ggplot(aes(x='date', y='beef'), data=meat)
p


Out[3]:
<ggplot: (270975377)>

Add some points


In [17]:
p + geom_point()


Out[17]:
<ggplot: (270975377)>

Add a line.


In [5]:
p + geom_point() + geom_line()


Out[5]:
<ggplot: (270975377)>

Add a trendline


In [6]:
p + geom_point() + geom_line() + stat_smooth(color='blue')


Out[6]:
<ggplot: (270975377)>

In [15]:
p + geom_point(color='black') + stat_smooth(method='ma', window=12, color='royalblue')


Out[15]:
<ggplot: (270975377)>

In [ ]: